Forced alignment for speech synthesis databases using duration and prosodic phrase breaks

نویسنده

  • Arthur R. Toth
چکیده

Alignment of text to recorded audio is limited by the fact that standard techniques do not handle very long utterances well. This work presents a model for segmenting long recordings into smaller utterances. Our approach differs from typical forced alignment techniques in that prosodic phrase break locations are first estimated, and then words are placed around breaks based on length and break probabilities for each word. This last step is performed by a HMM whose parameters are determined in a novel way. The results of classifying word boundaries on a well-publicized database [1] were 65.7% accuracy on actual breaks and 92.2% overall.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning speaker-specific phrase breaks for text-to-speech systems

The objective of this paper is to investigate whether prosodic phrase breaks are specific to a speaker, and if so, propose a mechanism of learning speaker-specific phrase breaks from the speech database. Another equally important aspect dealt in this work is to demonstrate the usefulness of these speaker-specific phrase breaks for a text-to-speech system. Experiments are carried out on two diff...

متن کامل

A prosodic phrasing model for a Korean text-to-speech synthesis system

This paper presents a prosodic phrasing model for Korean to be used in a textto-speech synthesis (TTS) system. Read text corpora were morpho-syntactically parsed and prosodically labeled following the Penn Korean Treebank [Han et al., 2002] and K-ToBI prosodic labeling conventions [Sun-Ah, 2000] respectively. Decision trees were trained with morpho-syntactic and textual distance features to pre...

متن کامل

Optimization of MFNs for signal-based phrase break prediction

The automatic prosodic annotation of large speech corpora gains increasing consideration since appropriate databases for the training of prosodic models in speech synthesis and recognition are needed. On linguistic level, correct phrase and accent marking are essential processing steps. The authors developed a neural network based method for signal-based phrase break prediction and tested this ...

متن کامل

Prosodic Phrase Detection for Chinese Tts Using Cart and Statistical Model

Determination of prosodic phrase break from text is one of the important problems in generating good prosody for Chinese text-to-speech system. In this paper, we propose a statistical approach for detecting prosodic phrase breaks. Part-of-speech sequence information is used as the primary information. The history of the previous breaks is considered as constraint in this work. The probabilities...

متن کامل

طراحی و ارزیابی یک مدل بازسازی گفتار به روش هم‌گذاری واحدهای حساس به بافت نوایی

This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004